智能论文笔记

Anomaly detection optimization using big data and deep learning to reduce false-positive

Khloud Al Jallad , Mohamad Aljnidi , Mohammad Said Desouki

分类：人工智能 | 机器学习

2022-09-28

基于异常的入侵检测系统（IDS）一直是一个热门研究主题，因为它具有检测新威胁的能力，而不仅仅是记忆的签名威胁基于签名的ID的威胁。尤其是在增加了增加黑客工具数量并增加攻击影响的高级技术之后。任何基于异常的模型的问题是其高阳性率。高阳性速率是为什么在实践中通常不使用异常ID的原因。因为基于异常的模型将看不见的模式分类为一种正常但不包括在培训数据集中的威胁。这种类型的问题称为模型无法概括的过度拟合。通过拥有包括所有可能正常情况的大型培训数据集来优化基于异常的模型可能是一个最佳解决方案，但不能在实践中应用。尽管我们可以增加培训样本的数量以包括更多正常情况，但我们仍然需要一个具有更多概括能力的模型。在本研究论文中，我们建议应用深层模型，而不是传统模型，因为它具有更大的概括能力。因此，我们将通过使用大数据和深层模型获得较少的假阳性。我们通过降低假阳性速率在优化基于异常ID的ID中进行了机器学习和深度学习算法进行比较。我们在NSL-KDD基准测试中进行了一个实验，并将我们的结果与IDS优化中传统学习中使用最佳的分类器之一进行了比较。该实验显示，通过使用深度学习而不是传统学习，假阳性降低了10％。

translated by 谷歌翻译

Big data analysis and distributed deep learning for next-generation intrusion detection system optimization

Khloud Al Jallad , Mohamad Aljnidi , Mohammad Said Desouki

分类：人工智能 | 机器学习

2022-09-28

随着信息技术在所有生命领域中的日益增长的使用，黑客攻击变得比以往任何时候都变得更加有效。同样，随着技术的发展，攻击数字每隔几个月就会成倍增长，并变得更加复杂，因此传统ID效率低下。本文提出了一种解决方案，不仅检测具有更高检测率的新威胁和比已经使用的ID更低的假阳性，而且还可以检测集体和上下文安全攻击。我们通过使用网络聊天机器人（一个深度的复发神经网络：apache Spark框架上的长期短期内存（LSTM））来实现这些结果异常。我们建议合并语言处理，上下文分析，分布式深度学习，大数据，流量分析的异常检测的概念。我们提出了一个模型，该模型描述了网络在其上下文中从数百万数据包中的序列中抽象正常行为，并将它们实时分析以检测点，集体和上下文异常。实验是在MAWI数据集上进行的，它显示出比签名ID的检测率更好，而且比传统异常ID更好。该实验显示较低的假阳性，较高的检测率和更好的点异常检测。至于有上下文和集体异常检测的证明，我们讨论了我们的主张和假设背后的原因。但是，由于硬件限制，该实验是在数据集的随机小子集上进行的，因此我们分享了实验和未来的愿景思想，因为我们希望将来的其他感兴趣的研究人员将来能够充分证明，这些研究人员拥有比我们的硬件基础架构更好的研究人员。

translated by 谷歌翻译

HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria

Saminu Mohammad Aliyu , Gregory Maksha Wajiga , Muhammad Murtala , Shamsuddeen Hassan Muhammad , Idris Abdulmumin , Ibrahim Said Ahmad

分类：自然语言处理

2022-11-28

Social media platforms allow users to freely share their opinions about issues or anything they feel like. However, they also make it easier to spread hate and abusive content. The Fulani ethnic group has been the victim of this unfortunate phenomenon. This paper introduces the HERDPhobia - the first annotated hate speech dataset on Fulani herders in Nigeria - in three languages: English, Nigerian-Pidgin, and Hausa. We present a benchmark experiment using pre-trained languages models to classify the tweets as either hateful or non-hateful. Our experiment shows that the XML-T model provides better performance with 99.83% weighted F1. We released the dataset at https://github.com/hausanlp/HERDPhobia for further research.

translated by 谷歌翻译

Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition

Hasan Abed Al Kader Hammoud , Shuming Liu , Mohammad Alkhrasi , Fahad AlBalawi , Bernard Ghanem

分类：计算机视觉 | 机器学习

2023-01-03

Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.

translated by 谷歌翻译

Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics

Yahao Ding , Zhaohui Yang , Quoc-Viet Pham , Zhaoyang Zhang , Mohammad Shikh-Bahaei

分类：机器学习 | 人工智能

2023-01-03

Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.

translated by 谷歌翻译

An Event-based Algorithm for Simultaneous 6-DOF Camera Pose Tracking and Mapping

Masoud Dayani Najafabadi , Mohammad Reza Ahmadzadeh

分类：计算机视觉

2023-01-02

Compared to regular cameras, Dynamic Vision Sensors or Event Cameras can output compact visual data based on a change in the intensity in each pixel location asynchronously. In this paper, we study the application of current image-based SLAM techniques to these novel sensors. To this end, the information in adaptively selected event windows is processed to form motion-compensated images. These images are then used to reconstruct the scene and estimate the 6-DOF pose of the camera. We also propose an inertial version of the event-only pipeline to assess its capabilities. We compare the results of different configurations of the proposed algorithm against the ground truth for sequences of two publicly available event datasets. We also compare the results of the proposed event-inertial pipeline with the state-of-the-art and show it can produce comparable or more accurate results provided the map estimate is reliable.

translated by 谷歌翻译

Russia-Ukraine war: Modeling and Clustering the Sentiments Trends of Various Countries

Hamed Vahdat-Nejad , Mohammad Ghasem Akbari , Fatemeh Salmani , Faezeh Azizi , Hamid-Reza Nili-Sani

分类：自然语言处理

2023-01-02

With Twitter's growth and popularity, a huge number of views are shared by users on various topics, making this platform a valuable information source on various political, social, and economic issues. This paper investigates English tweets on the Russia-Ukraine war to analyze trends reflecting users' opinions and sentiments regarding the conflict. The tweets' positive and negative sentiments are analyzed using a BERT-based model, and the time series associated with the frequency of positive and negative tweets for various countries is calculated. Then, we propose a method based on the neighborhood average for modeling and clustering the time series of countries. The clustering results provide valuable insight into public opinion regarding this conflict. Among other things, we can mention the similar thoughts of users from the United States, Canada, the United Kingdom, and most Western European countries versus the shared views of Eastern European, Scandinavian, Asian, and South American nations toward the conflict.

translated by 谷歌翻译

In Quest of Ground Truth: Learning Confident Models and Estimating Uncertainty in the Presence of Annotator Noise

Asma Ahmed Hashmi , Artem Agafonov , Aigerim Zhumabayeva , Mohammad Yaqub , Martin Takáč

分类：计算机视觉 | 机器学习

2023-01-02

The performance of the Deep Learning (DL) models depends on the quality of labels. In some areas, the involvement of human annotators may lead to noise in the data. When these corrupted labels are blindly regarded as the ground truth (GT), DL models suffer from performance deficiency. This paper presents a method that aims to learn a confident model in the presence of noisy labels. This is done in conjunction with estimating the uncertainty of multiple annotators. We robustly estimate the predictions given only the noisy labels by adding entropy or information-based regularizer to the classifier network. We conduct our experiments on a noisy version of MNIST, CIFAR-10, and FMNIST datasets. Our empirical results demonstrate the robustness of our method as it outperforms or performs comparably to other state-of-the-art (SOTA) methods. In addition, we evaluated the proposed method on the curated dataset, where the noise type and level of various annotators depend on the input image style. We show that our approach performs well and is adept at learning annotators' confusion. Moreover, we demonstrate how our model is more confident in predicting GT than other baselines. Finally, we assess our approach for segmentation problem and showcase its effectiveness with experiments.

translated by 谷歌翻译

Graph Federated Learning for CIoT Devices in Smart Home Applications

Arash Rasti-Meymandi , Seyed Mohammad Sheikholeslami , Jamshid Abouei , Konstantinos N. Plataniotis

分类：机器学习

2022-12-29

This paper deals with the problem of statistical and system heterogeneity in a cross-silo Federated Learning (FL) framework where there exist a limited number of Consumer Internet of Things (CIoT) devices in a smart building. We propose a novel Graph Signal Processing (GSP)-inspired aggregation rule based on graph filtering dubbed ``G-Fedfilt''. The proposed aggregator enables a structured flow of information based on the graph's topology. This behavior allows capturing the interconnection of CIoT devices and training domain-specific models. The embedded graph filter is equipped with a tunable parameter which enables a continuous trade-off between domain-agnostic and domain-specific FL. In the case of domain-agnostic, it forces G-Fedfilt to act similar to the conventional Federated Averaging (FedAvg) aggregation rule. The proposed G-Fedfilt also enables an intrinsic smooth clustering based on the graph connectivity without explicitly specified which further boosts the personalization of the models in the framework. In addition, the proposed scheme enjoys a communication-efficient time-scheduling to alleviate the system heterogeneity. This is accomplished by adaptively adjusting the amount of training data samples and sparsity of the models' gradients to reduce communication desynchronization and latency. Simulation results show that the proposed G-Fedfilt achieves up to $3.99\% $ better classification accuracy than the conventional FedAvg when concerning model personalization on the statistically heterogeneous local datasets, while it is capable of yielding up to $2.41\%$ higher accuracy than FedAvg in the case of testing the generalization of the models.

translated by 谷歌翻译

Data Augmentation using Transformers and Similarity Measures for Improving Arabic Text Classification

Dania Refai , Saleh Abo-Soud , Mohammad Abdel-Rahman

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-28

Learning models are highly dependent on data to work effectively, and they give a better performance upon training on big datasets. Massive research exists in the literature to address the dataset adequacy issue. One promising approach for solving dataset adequacy issues is the data augmentation (DA) approach. In DA, the amount of training data instances is increased by making different transformations on the available data instances to generate new correct and representative data instances. DA increases the dataset size and its variability, which enhances the model performance and its prediction accuracy. DA also solves the class imbalance problem in the classification learning techniques. Few studies have recently considered DA in the Arabic language. These studies rely on traditional augmentation approaches, such as paraphrasing by using rules or noising-based techniques. In this paper, we propose a new Arabic DA method that employs the recent powerful modeling technique, namely the AraGPT-2, for the augmentation process. The generated sentences are evaluated in terms of context, semantics, diversity, and novelty using the Euclidean, cosine, Jaccard, and BLEU distances. Finally, the AraBERT transformer is used on sentiment classification tasks to evaluate the classification performance of the augmented Arabic dataset. The experiments were conducted on four sentiment Arabic datasets, namely AraSarcasm, ASTD, ATT, and MOVIE. The selected datasets vary in size, label number, and unbalanced classes. The results show that the proposed methodology enhanced the Arabic sentiment text classification on all datasets with an increase in F1 score by 4% in AraSarcasm, 6% in ASTD, 9% in ATT, and 13% in MOVIE.

translated by 谷歌翻译